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A 3D nonlinear postprocessing system and 
method are utilized to reduce coding artifacts 
produced by block-based motion-compensated 
transform coding. In the system and method, a 
separable 3D filtering structure is used : 
space-variant FIR-Median Hybrid filtering (in 
106) is used in the spatial domain, followed by a 
motion-compensated (on 112) nonlinear filter- 
ing in the temporal domain (in 110). By using 
this structure and method, the coding artifacts 
in a reconstructed image sequence can be ef- 
fectively reduced without blurring edges or 
moving objects in the image sequence. Signific- 
ant improvement in the picture quality of low 
bit-rate coded video sequences (111) is thereby 
achieved. 
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BACKGROUND OF THE INVENTION 
Field of the Invention 

5 The present invention relates generally to the field of digital video processing and, more particularly, to a 

system and method for post-processing decompressed motion video sequences where a block-based motion- 
compensated transform coding technique is used for compression and decompression. 

Description of the Related Art 

Video images can be represented by a digital signal in which a series of bits of information is used to rep- 
resent each video frame. Where the bandwidth of a particular communication system is limited, such as in the 
integrated services digital network (ISDN) or the public telephone network, low bit-rate image coding is par- 
ticularly useful for transmitting visual images and communications. As such, increasing demands have been 
is placed on the use of low bit-rate coding. Low bit-rates between px9.6 kblt/s and px384 kbit/s are most fre- 
quently used for low bit-rate transmissions. As demands increase, the picture quality of video images gener- 
ated through low bit-rate coding becomes critical. 

Coded picture quality is generally determined by the type of coding technique used and the targeted bit 
rate. Coding processes/however, have Inherent loss characteristics. Image coding often results in noise or 
20 spurious signals upon reconstruction of the image. Noise and spurious signals that occur as a result of such 
imaging techniques are often referred to as artifacts. Artifacts can often reach a level where they appear in 
the video image with as much strength as the signals produced by the real objects of the image. Moreover, 
artifacts often become even more visible for low bit-rate transmissions. 

The characteristics of these artifacts depend on the form of coding technique used. Currently, the most 
25 well-known and popular low bit-rate coding technique involves block-based motion-compensated transform 
coding. Such a coding technique is used in many image compression standards, such as the CCITT (Consul- 
tative Committee on International Telegraphy and Telephony) Recommendation H.261, Draft Revised Recom- 
mendation H.261-V1deo Codec for Audiovisual Services at px64 kbit/s, Study Group XV-Report R95 (May 
1 992), herein incorporated by reference, and the more recent proposals arising out of the Telecommunication 
30 Standardization Sector Study Group 15, Working Party 15/1 Expert's Group on Very Low Bitrate Videophone 
(LBC-93), herein incorporated by reference. 

This CCITT standard, although an image compression standard often used for multimedia applications, 
produces highly noticeable artifacts upon reconstruction of an image at low bit-rate. Those artifacts are often 
referred to as "blocking effects", Quantization noise" and the "mosquito phenomenon". Future standards using 
35 block-based motion-compensated transform coding will likely also produce such artifacts. 

"Blocking effects" are spatial domain distortions which appear as discontinuities on the edges of an image 
and which yield average values of luminance and chrominance across a block boundary, a so-called spatial 
artifact These distortions are caused by using different coding parameters for adjacent blocks. For blocks con- 
taining edges, edges can be discontinuous on block boundaries since each block is coded independently of 
40 Its neighbors. Similarly, for monotone blocks where the intensity in the original image changes gradually, the 
intensity In the coded blocks can change abruptly from one block to another, due to the different quantization 
parameters used for each block. 

"Quantization noise" is a distortion caused by quantization processes. When a quantization step size is 
small, the distortion caused by the quantization is called granular noise, a noise with a high spatial frequency 
45 that is evenly distributed over an entire block. This noise is independent of the signal. When a quantization 
step size is large, the quantization noise is signal-dependent since the signal can be all mapped to zero. Where 
large quantization forces the high frequency components to zero, the artifacts can be described as "noise con- 
tour". Various coefficients used in Discrete Cosine Transform (DCT) techniques may cause this distortion to 
spread over the entire block when the edge block is transformed into the DCT domain, particularly in view of 
so the fact that DCT techniques are inefficient in representing edges compared with flat areas. Significant losses 
in edge information and the creation of speckle-like distortions near the edges of a particular block may occur 
as a direct result of coarse quantization of the DCT coefficients. 

The "mosquito phenomenon" is a high frequency granular type of noise, which appears similar to a moving 
cloud of insects, that is caused by motion compensation and quantization processes. This phenomenon is a 
55 distortion which appears in the temporal domain, a so-called temporal artifact. These artifacts further degrade 
reconstructed picture quality. 

Therefore, to improve the reconstructed picture quality, postprocessing to reduce the aforementioned ar- 
tifacts is frequently necessary. 
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There have been many nonlinear postprocessing techniques proposed for reducing coding artifacts and 
for improving reconstructed picture quality. For example, in an article published by B. Ramamurthi and A. Ger- 
sho, "Nonlinear Space-Variant Postprocessing of Block Coded Images", IEEE Trans, on Acoustics, Speech, 
and Signal Processing, VOL. ASSP-34, No. 5, October 1986, the authors proposed a space-variant 'nonlinear 
5 postprocessing technique. 

^ Similarly, in articles published by R.L Stevenson, "Reduction of Coding Artifacts in Transform Image Cod- 
ing", Proceedings of the international Conference on Acoustics, Speech, and Signal Processing, 1993; and A. 
Zakhor, "Iterative Procedures for Reduction of Blocking Effects in Transform Image Coding 0 , IEEE Transac- 
tions on Circuits and Systems for Video Technology, Vol. 2, No. 1, March 1992, pp. 91-95, Lee et al. f "Post- 

10 processing of Video Sequence Using Motion Dependent Median Filters," SPIE Vol. 1606 Visual Communica- 
tions and Image Processing f 91, Image Processing, July 1991, among others, the authors proposed various 
techniques for improving picture quality. 

However, the aforementioned proposed techniques were designed to reduce artifacts produced by either 
still image coding techniques, where no motion compensation is employed, or other coding techniques which 

16 do not utilize block-based motion-compensated transform coding. 

In addition, in articles published by A. Nieminen, P. Heinonen, and Y. Neuvo, "A New Class of Detail-Pre- 
serving Filters for Image Processing", IEEE Trans, on Pattern Analysis and Machine Intelligence, VOL. PAMI- 
9, No. 1, Jan. 1987, and P. Heinonen, "FIR-Median Hybrid Filters", IEEE Trans, on Acoustics, Speech and Sig- 
nal Processing, VOL ASSP-35, No. 6, June 1987, various filtering techniques have also been proposed. 

20 Different coding techniques produce different artifacts. As such, motion-compensated coding techniques 

often introduce different spatial and temporal artifacts on the reconstructed picture. Techniques proposed by 
these various authors are unnecessarily complex and often fail to account for the different spatial and temporal 
artifacts produced by block-based motion-compensated transform coding. As such, the various techniques 
proposed to date are not particularly efficient for eliminating spatial and temporal artifacts from decompressed 

25 signals generated by block-based motion-compensated transform coding. 

Features and Advantages of the Invention 

The present invention achieves many advantages and contains features for efficiently reducing artifacts 
30 introduced into video images by block-based motion-compensated transform coding. One feature of the pres- 
ent invention includes the ability to optimize video images transmitted at low bit-rates. An other feature includes 
the ability to adapt spatial and temporal filtering to local signal statistics. 

These and other features of the present invention achieve significant reduction in coding artifacts without 
blurring or distorting (1) the edges of video images and (2) moving objects in the image sequence. 
35 in addition, the present invention achieves the advantages associated with the use of low-cost filtering 

techniques, which are not unnecessarily complex, to optimize video picture quality. 

Summary of the Invention 

40 The present invention provides a system and method for enhancing decompressed motion video sequenc- 
es, where the compression technique Involves block-based motion-compensated transform coding. 

In one embodiment of the present invention, a separable 3D filtering structure is used: space-variant FIR- 
Median hybrid filtering is used in the spatial domain, followed by motion-compensated nonlinear filtering in the 
temporal domain. By using this structure, the coding artifacts in the reconstructed image sequence can be 

45 effectively reduced without blurring edges or moving objects in the image sequence. 

In addition, the present invention provides spatial and temporal operations which are adaptive to the local 
statistics of the signal, both spatially and temporally. One-dimensional spatial operations for edge oriented pix- 
els and two-dimensional spatial operations for nonedge pixels are followed by motion compensated nonlinear 
temporal filtering to optimize a reconstructed picture. 

50 in particular, the present invention provides spatial operations which switch between linear and nonlinear 
filtering depending on the spatial local statistics of the signal. Specifically, a two-dimensional low-pass filter 
is used for flat areas and one-dimensional space-variant FIR-Median hybrid filters are used for edge areas. 
The FIR-Median hybrid filter is designed so its root structure is parallel with the direction of the edge orienta- 
tion. The particular use of one-dimensional FIR-Median Hybrid filters with root structures parallel to the edge 

55 orientation achieves significant reduction in spatial artifacts on edge areas, thereby enhancing the definition 
and clarity of the edges without blurring the image. 

In addition, the present invention includes a system and method for adaptive motion-compensated frame 
averaging, thereby reducing temporal artifacts without causing blurring of a moving object. 

3 
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These, together with other features and advantages which will be subsequently apparent, reside in the 
details of construction and operation as more fully hereinafter described and claimed, with reference being 
had to the accompanying drawings forming a part hereof, wherein like numerals refer to like elements through- 
out. 

5 

Brief Description of The Drawings 

Figure 1 is block diagram illustrating the basic operation of a video decoder with the postprocessing fil- 
tering system and method of the preferred embodiment of the present invention; 
10 Figure 2 is a block diagram illustrating the postprocessing filtering system and method of the preferred 
embodiment of the present invention. 

Figure 3 is a block diagram illustrating the system and method of edge orientation detection performed 
by the preferred embodiment of the present invention. 

Figure 4 is a block diagram illustrating the system and method of spatial filtering performed by the preferred 
15 embodiment of the present invention. 

Figure 5 is a diagram of the line structures parallel to the root structures of the spatial filters depicted in 
Figure 4. 

Figure 6 is a diagram of the one-dimensional Finite-lmpuise-Response-Median hybrid (FMH) filter utilized 
in the preferred embodiment of the present invention. 
20 Figure 7 is a diagram of the directions for use in determining the average taking of the Finite Impulse Re- 

sponse (FIR) linear phase subfilters of the preferred embodiment of the present invention. 
Figure 8 is a block diagram of the temporal filtering and storage system and method of the preferred em- 
bodiment of the invention. 

Figure 9 is a chart illustrating the weighting factor coefficients of the temporal filter, a and p, of the pre- 
25 ferred embodiment of the present invention. 

Detailed Description of The Preferred Embodiment 

Figure 1 is a simplified block d iagram of a video decoder in use with the postfi Iter implementing the system 
30 and techniques of the present invention. A video bit stream 98 is supplied as input to the decoder 99, where 
the bitstream is decoded and the reconstructed digital video sequence 101 is produced. Signals 11 2 are motion 
vectors decoded from the video bitstream at decoder 99. The decoder 99 uses a block-based motion-com- 
pensated transform coding technique such as the technique utilized in H.261. The reconstructed digital video 
signal 101 and the motion information 112 are supplied to the postfilter 100 to produce a coding-artifacts-re- 
35 duced-video-sequence 111 . Coding-artifacts-reduced-video-sequence 111 is then displayed on a display de- 
vice 113. Postfilter 100 is a postprocessor in that it processes information from a decoded and reconstructed 
video sequence 101. 

Figure 2 is a block diagram of the postfilter 100 implementing the techniques of the present invention. The 
postfilter 100 comprises generally the following components: an edge orientation detector 102, an 
40 edge/nonedge decision subprocessor 104, a spatial filter bank 106. a frame buffer 108, and a temporal filter 
110, each of which will be described in greater detail below. 

Postfilter 1 00 which implements all of the decision processes of the present invention is preferably a digital 
signal processor such as a model DSP 3210 manufactured by AT&T. An 80386 type processor manufactured 
by Intel® and other types of processors may also be used. In the alternative, each of the aforementioned com- 
45 ponents which implement various portions of the filtering processes of postfilter 100 may be separate proc- 
essors. 

With continuing reference to Figure 2, the general operation of the postfilter 100 will now be described. A 
reconstructed video sequence 101 is first supplied as input to block 102, an edge orientation detector, where 
the edge orientation of each pixel in a video frame is detected. Edge orientation information is supplied as an 
so input signal 103 to block 104, an edge/nonedge decision subprocessor, where the edge orientation signal 103 
is compared with a predetermined threshold T g to produce a control signal 105. 

The control signal 105 controls a bank of spatial filters at block 106. At block 106, the input signal 101 is 
filtered by the filter selected according to signal 105. The spatial filter bank at block 106 produces a signal 
1 07 which comprises spatially filtered video frames. Spatially filtered video frames 107 are then stored at buffer 
55 108. Buffer 108 is designed to store within its memory at least three temporal frames for subsequent temporal 
filtering. The three temporal frames consist of at least the following frames: a current frame, a previous frame, 
and a future frame. Buffer 108 yields its stored video frames as Signal 109. At block 110, the current video 
frame is filtered by using motion information, from signal 112, and the previous and the future spatially filtered 

4 
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frames to produce signal 111. Signal 111 is the final filtered video signal which is then transmitted to a display 
device 113. 

Figure 3 is a block diagram of an edge orientation detector 102. An unfiltered video frame is represented 
as f(x f y). A pixel in this unfiltered video frame is represented as p(x,y). For each pixel p(x,y), the edge orien- 
tations can be at one of four angles: 0°, 45°, 90°, and 135°. To compute the edge orientation at each point (x,y), 
a set of template gradient impulse response arrays H, is used. The template gradient impulse response arrays 
{H| t i=1,.„ 4} are defined as 
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Hi is the impulse response for the horizontal (0°) gradient, H 2 is the impulse response for the 45° gradient, H 3 
is the impulse response for the vertical (90°) gradient and H 4 is the impulse response for 135° gradient The 
edge template gradient at any point (x,y) is defined as 

Gmax(*,y) = max [ | G,(x,y) | . | G 2 (x,y) | , | G 3 (x,y) I . I G 4 (x,y) I ] (3) 

where 

G/x,y) = f{x,y)<S> H, (4) 

is the gradient in the mth equispace direction obtained by convolving the image f(x,y) with a gradient impulse 

response array {H, t i = 1 4}. The edge angle is determined by the direction of the largest gradient G max (x,y) 

in signal 103. Edge orientation detector 102 performs the above processing to determine the edge orientation 
of each pixel in the video bitstream 98 decoded by video decoder 99. 

Signal 103 is supplied to the edge/nonedge decision subprocessor 104 to produce control signal 105. At 
edge/nonedge decision subprocessor 104, in order to determine whether the pixel belongs to the edge or non- 
edge class, the maximum gradient G^x.y) of the pixel p(x,y) in signal 103, is compared with a predetermined 
threshold T g . The predetermined threshold T fl is a value selected from experimental data. In the preferred em- 
bodiment, where H.261 guidelines are used, the pixel value is in the range of 0 to 255. Accordingly, in the pre- 
ferred embodiment T 0 is equal to a threshold of 20. If G max (x,y) > T fl , then the pixel at position (x,y) belongs 
to the edge class with an edge orientation determined by the direction of the greatest gradient G maj ((x t y); other- 
wise, the pixel at (x,y) belongs to the nonedge class. 

The edge or nonedge indication signal 105 is supplied to spatial filter bank 106. The edge/nonedge indi- 
cation signal 105 indicates the specific edge orientation (i.e., 0°, 45°, 90°, 135°) or that the signal is not an 
edge signal. 

Figure 4 is a block diagram of the spatial filter bank 1 06. The spatial filter bank 1 06 preferably consists of 
five filters. Filter 1 to Filter 4 are one-dimensional (1 D) FIR-Median Hybrid (FMH) filters with root structures 
parallel to the line structures shown in Figure 5. 

If the edge orientation is determined to be 0° (Figure 5a), filter 1 is selected; if the edge orientation is 45° 
(Figure 5b), filter 2 is selected; if the edge orientation is 90° (Figure 5c), filter 3 is selected; if the edge orien- 
tation is 135° (Figure 5d), filter 4 is selected. For a nonedge pixel, filter 5 is selected. The decision to process 
the signal as an edge or a nonedge signal is controlled by edge/nonedge signal 105, as discussed above. 

Generally, the 1 D FMH filter used in the preferred embodiment of the present invention consists of 3 linear 
phase FIR filters and a median filter, although it is to be understood that the number of the FIR filters may be 
altered. The transfer functions of the three linear phase FIR filters of the preferred embodiment are as follows: 

H,(z) = |[^ + zt- 1 + ,... + z i] (5) 

H 2 (z) = 1 (6) 

" 3 (z) = £[z- 1 +2~ 2 + ,... + z-4 (7) 

where the window length is (2L+1). For example, if p(c) is the center pixel along a line within a window having 
a length of (2L+1), then the outputs of the three FIR filters are 
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Pi = £tP(c+1) +p(c + 2) + ...+ p(c + L)] (8) 

Pi = P(c) (9) 
Pa = i [P(c - 1) + P(c - 2) + ... + p(c - L)] (10) 

Essentially, therefore, the output of the first FIR filter p, is a function of the sum of all pixels along a line 
from (but not including) center pixel p(c) to the pixel at the edge of the window, p(c+L). Similarly, the output 
of the third FIR filter p 3 is a function of the sum of all pixels along the same line from (but not including) center 
pixel p(c) to the pixel at the opposite edge of the window, p(c-L). Meanwhile, the output of the center filter, p 2 , 
is simply equal to center pixel p(c). 

These filters are one-dimensional in that only one variable is utilized for incremental movement (pixel by 
pixel) along a predetermined fixed line. Thus, even though movement may occur in both the x and y directions 
(when a predetermined diagonal line is utilized) of a two-dimensional x-y coordinate frame, these filters utilize 
only a one-dimensional reference frame and therefore achieve the simplicity of a one-dimensional linear fil- 
tering technique. 

The output of the FMH filter is 

Poutput - median{p u [>2,Pz)- 
A diagram of the 1D FMH filter with a window of size 5 is shown in Figure 6. 

in the preferred embodiment of the present invention, an index i, where i can be E, W, S, N, NE, SW, NW, 
and SE, is utilized to indicate the direction where the FIR filter is operated and measured from a central input 
pixel p(x,y). The output of each of the three FIR linear phase subfilters p^x.y) is defined as the average taking 
over the linear direction i. A diagram of the linear directions E, W, S, N, NE, SW, NW, and SE, used in the pre- 
ferred embodiment of the present invention is shown in Figure 7. It is to be understood, however, that additional 
directions may be utilized. 

The output of the FIR linear phase subfilters p,(x,y) in the directions shown in Figure 7 are therefore defined 

as 

PE(x,y) = i \P(x + 1 .y) + P(x + 2,y) + ... + p(x + L,y)] (12) 
Pw<x,y) * fax- Ly) + P(*- 2,y) + ... + P(x- i-,y)] (13) 
Ps(x ( y) = I [p(x,y - 1) + p(x,y - 2) + ... + p(x,y - L)] (14) 
P«(x,y) - £[p(x,y-M) +p(x,y + 2) + ...+ p(x,y+L)l (15) 
PwE(x,y) = £[p(x+1,y+1) +p(x + 2,y+2) +...+ p{x + L,y+L)) (16) 
Psw<x,y) = ^(x-Ly-l) +p(x-2,y-2) + ...+ p(x-L.y-L)] (17) 
PNvAxtf = {[P(x-1.y+1) +p(x-2,y+2) +...+ p(x-Ly+L)] (18) 

p S£ (x,y) = £(P(x+1,y-1) +p(x + 2,y-2) +...+ p{x + L,y-L)] (19) 

However, it is to be understood that although equations 12-1 9 define the outputs of the above-noted FIR 
linear phase subfilters In a two-dimensional x-y coordinate frame, the output of the filters is one-dimensional 
in that the average of all pixels along a direction from (but not including) a center pixel to the pixel at the edge 
of the window is calculated, thereby utilizing the one-dimensional linear filtering techniques of equations 8- 
10. 

As there are four edge classes with edge orientations 0°, 45°, 90°, and 135°, the FMH filter with different 
root structures is applied to the pixels in each class. With p(x,y) as the center pixel within a square window 
with size (2L+1)x(2L+1), the output of filter 1 to filter 4 is defined as 

Pwen(x,tf = medlan[pdKy) t Pv^x,y) t p{x t y)] (20) 
Pmted^y) = median\p N ^x t y) t psw{x,y) 1 p(x t y)] (21) 
Pff/*r3(x,y) = medIan\ptAx,y) t Ps{x>y)*P{x>rfl ( 22 ) 
Pr//te*(x,y) = med/anlp ww (x,y),p s£ (x f y) t p(x l y)] (23) 
where Pnner.(x,y) is the overall output taken over the median of the outputs of three FIR linear phase subfilters 
Pi(x,y) along the direction of the determined edge orientation. 

While the aforementioned filtering techniques are used for edge pixels, a two dimensional (2D) low-pass 
filter 5 is used for nonedge pixels. If p(x,y) is the center pixel within a square window with size (2L+ 1) x (2L 
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+ 1 ), then the output of the filter is 



L L 

(24) 



To further reduce the coding artifacts, the spatially filtered video sequence is further filtered at block 1 1 0 

10 temporally. Motion-compensated temporal filtering is performed particularly because such filtering will en- 
hance artifact reduction in the moving areas of an image. To implement motion-compensated temporal filtering, 
motion information is needed. Theoretically, estimates regarding either forward motion information or back- 
ward motion information from the reconstructed image sequences may be utilized. However, as motion esti- 
mation can often be very complicated and therefore very expensive to compute, it is advantageous to not com- 

15 pute the motion vectors separately. Instead, the motion vectors extracted for decoding can be stored and re- 
used for postprocessing. In most low-bit rate coding techniques such as H.261, block-based forward motion 
vectors are available, and these motion vectors can be extracted from the coded bitstream and stored for later 
postprocessing. The motion information used here is represented by signal 112. Additional disclosure regarding 
such motion vectors may be found within CCITT recommendation H.261 and a publication by A. Netravali and 

20 B. Haskell, "Digital Pictures Representation and Compression," Plenum Press: New York, 1989. 

Figure 8 shows the diagram of the temporal filter block 110 and frame buffer 108. As temporal filtering 
involves more than one frame, frame buffer 1 08 is used to store spatially filtered video frames 1 07. In the pre- 
ferred embodiment, frame buffer 1 08 can store at least 3 frames: the current frame f ni the previous frame f^ n , and 
the future frame f n+1 . Buffer 108 yields these stored frames as signal 1 09 to temporal filter 110. 

25 The pixel at spatial location (i,j) in frame n is defined as p n (i j). Similarly, the pixel in frame n+1 is defined 

as Pn+i(ij), and Pn-i (ij) is the pixel in the previous best matching frame f n-i . The previous best matching 

frame consists of macroblocks from either the previous frame n-1 or the motion-compensated frame. Pn-i (ij) 
is then defined as 



30 



35 



40 
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50 
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/■ -n - l-Pn-i^'J) if nojncjSE<mc_SE, (25) 
U,JJ - \p n -i(i* j) otherwise. 



where Pn-i(ij) is the pixel from the motion-compensated macroblock of a 1 6 x 16 group of pixels, and p^ t (ij) 
is the pixel from the previous frame n-1. No_mc_SE and mc_SE are error powers between the motion-com- 
pensated block, and the previous block, respectively. These terms are defined as follows: 



16 16 

JncSE=££ IPnU.V-p^xii.j)]*, (26) 



i=lj=l 



16 16 

nojnc_SE = £ [p n (i, j) -p n ^(i f j)]* . (27) 
i=lj = l 

With continuing reference to Figure 8, frame difference e between the current frame f n and the previous 

best matching frame f „- x is calculated at first temporal subprocessor 20. Then, the frame difference e at 
the location (ij) is computed as 

«(/</» = Pn(i,j) - A-i (ij) (28) 

The frame difference e(ij) between the current frame f n and the previous best matching frame ? n-i is 
sent to nonlinear mapper 22 where the coefficient a, a weighting factor, is determined based on the frame dif- 
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ference e(i,j). 

The weighting factors a(i j) for each pixel in an i-by-j matrix of pixels are multiplied by the frame difference 
e for each pixel by a second temporal subprocessor 21 and then added by a third temporal subprocessor 23 
to the corresponding pixel of the previous best matching frame f^ t to obtain an intermediate weighted average 

5 current pixel Pn of an intermediate weighted average current frame (ij). 

A second frame difference e' between the weighted average pixel ^ (i,j) and the pixel p^ from the future 
frame f n+1 is computed by a fourth temporal subprocessor 24 as 

10 eVJ) = - p n + 1 (/J) (29) 

The second frame difference e' is then used by nonlinear filter 25 to obtain a second weighting factor p 
in the same manner as a was obtained. 

The weighting factors P(i,j) for each pixel in an l-by-j matrix of pixels are multiplied by the frame difference 
e' for each pixel by a fifth temporal subprocessor 26. 
f$ A filtered pixel p n (ij) is calculated at sixth temporal subprocessor 27 as a weighted average of the inter- 
mediate pixel Pn (ij) and the future adjacent pixel p^v 

The function(s) of each of the aforementioned temporal subprocessors 20, 21, 23, 24, 26, and 27 as well 
as the non-linear mappers 22 and 25 are preferably implemented by the processor which performs the f unc- 
20 tions of postf liter 100 or, in the alternative, may implemented by separate standard processors such as an 
80386. 

The coefficients of the temporal filter, a and 0, follow the nonlinear relationship shown in Figure 9. T a and 
T b are predetermined thresholds determined from experimental data. In the preferred embodiment where 
H.261 guidelines are used, the pixel value is in the range of 0 to 255. Accordingly, in the preferred embodiment, 

25 a is selected from the range 0.1 to 1 , p is selected from the range 0.1 to 1 , T a is equal to a threshold of 9, and 
T b is equal to a threshold of 40. If e or e' is less than T at meaning that there is not much movement in the current 
pixel, or the motion of the current pixel is well tracked, then simple pixel averaging is applied. If e or e' is larger 
than T b( indicating that there is a scene change or there is fast movement which can not be tracked by the 
motion vectors, no operation is applied. If e is between T a and T b , the current pixel is in a transition area and 

30 weighted averaging is applied. By using this operation, the moving portion of an image can be well-preserved 
and noise can be effectively reduced. 

The pixel in the filtered frame can be obtained by using the following equations: 



p n (i,j) = ap n (i, j) +(l-*)P a „ x {i,j) i (30) 
PJi,j) = PP fl <i,J) + (l-p)Pn.iU,j) (31) 
= p (1-a) p^dj) +papji # j)+(l-p)p 1]+1 {i l j) (32) 

where p n (ij) is the pixel in the filtered frame. Filtered signal 111 contains a sequence of filtered pixels p n (i,j). 
This filtered signal 111 is then sent to the display device 113. 

Therefore, in summary, the features of the present invention include the use of a separable 3D filtering 
structure: space-variant FIR-Median hybrid filtering is used in the spatial domain, followed by motion-compen- 
sated nonlinear filtering in the temporal domain. As such, the present invention provides spatial and temporal 
operations which are adaptive to the local statistics of the signal, both spatially and temporally. In particular, 
spatial operations switch between linear and nonlinear filtering depending on the edge/nonedge orientation 
of a signal and temporal operations account for motion compensated signals. Moreover, the aforementioned 
filtering structure optimizes the filtering process by using cost-efficient and simplified one-dimensional spatial 
filtering techniques. 

The many features and advantages of the invention are apparentfrom the detailed specification, and thus, 
it is intended by the appended claims to cover all such features and advantages of the Invention which fall within 
the true spirit and scope of the invention. Further, since numerous modifications and variations will readily 
occur to those skilled in the art; it is not desired to limit the invention to the exact construction and operation 
illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling 
within the scope of the invention. 
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Claims 

1 . A filtering system for reducing artifacts in motion video sequences generated by block-based motion-com- 
pensated transform coding from a video decoder, comprising: 

a postprocessor connected to said video decoder, said postprocessor including: an edge orientation 
detector; a spatial filter bank, said spatial filter bank being configured to receive information from said 
edge orientation detector, said filter bank comprising a one-dimensional filter utilizing said information to 
generate spatially filtered video sequences; 

and a motion-compensated temporal filter, said motion-compensated temporal filter receiving spa- 
tially filtered video sequences generated by said spatial filter bank, said motion-compensated temporal 
filter being configured to generate temporally filtered video sequences from said spatially filtered video 
sequences; and 

a frame memory connected to said postprocessor, said frame memory being arranged to receive 
spatially filtered video sequences from said spatial filter bank. 

2. The filtering system of claim 1 wherein said edge orientation detector utilizes a set of template gradient 
impulse response arrays to compute edge orientation. 

3. The filtering system of claim 1 wherein said spatial filter bank comprises at least two finite impulse re- 
sponse linear phase filters and a median filter. 

4. The filtering system of claim 1 wherein said spatial filter bank further comprises a two dimensional low- 
pass filter for filtering nonedge pixels in said motion video sequence. 

5. The filtering system of claim 1 wherein said frame memory is configured to store signals for a current 
frame, an adjacent prior frame and an adjacent future frame. 

6. The filtering system of claim 8 wherein said motion-compensated temporal filter is configured to: 

determine a best matching frame from said adjacent prior frame and a motion-compensated frame; 

and 

calculate a frame difference between a current frame and said best matching frame. 

7. A system for filtering decoded noise-contaminated signals for video communication, comprising: 

a computer processor; 

means for filtering spatial artifacts from decoded noise-contaminated signals, said means for fil- 
tering spatial artifacts comprising an edge detector and a one-dimensional filter; 

means for storing signals from said decoded noise-contaminated signals for at least a current 
frame, a prior frame and a future frame; 

means for calculating a best matching frame from said prior frame and a motion-compensated 

frame; 

means for calculating an intermediate weighted average current frame from said current frame and 
a best matching frame, such that the best matching frame is given less weight as the difference between 
said intermediate weighted average current frame and said motion-compensated prior frame increases; 
and 

means for calculating a filtered frame from a weighted average of the intermediate weighted aver- 
age current frame and said future frame, such that said future frame is given less weight as the difference 
between said intermediate weighted average current frame and said future frame increases. 

8. A three dimensional filter system for enhancing decompressed motion video sequences generated by 
block-based motion-compensated transform coding, comprising: 

a one-dimensional space-variant FIR-median hybrid filter, said one-dimensional space-variant 
FIR-median hybrid filter being structured to reduce the effect of spatial artifacts generated by block-based 
motion-compensated transform coding on said motion video sequences; 

a memory to store least a portion of said motion video sequences, said memory being connected 
to said one-dimensional space-variant FIR-median hybrid filter; and 

a motion-compensated nonlinear filter, said motion-compensated nonlinear filter being structured 
to reduce the effect of temporal artifacts generated by block-based motion-compensated transform cod- 
ing on said motion video sequences; 
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wherein said one-dimensional space-variant FIR-median hybrid filter, said memory and said mo- 
tion-compensated nonlinear filter are arranged in series. 

9. The filter system of claim 8 further comprising: 

a one-dimensional edge detector, said edge detector being configured to input information to said 
space-variant FIR-median hybrid filter to reduce the effect of spatial artifacts. 

10. A method of processing and displaying decoded video image sequence signals generated by block-based 
motion-compensated transform coding, comprising the steps of: 

detecting the edge orientation of a pixel of said video image sequences; 

filtering spatial artifacts from said video image sequences using a one-dimensional filter for pixels 
determined to have an orientation in an edge region of said video sequence signals; 

storing at least a current frame, an adjacent prior frame and an adjacent future frame from said 
video image sequence signals; 

calculating a temporally filtered frame from a weighted average of said adjacent future frame, said 
current frame and a motion-compensated adjacent prior frame; and 

displaying said temporally filtered frame. 

11. The method of claim 10 wherein said step of calculating a temporally filtered frame further comprises the 
steps of: 

calculating a best matching frame; 

calculating a frame difference between said current frame and said best matching frame; 
determining a weighting factor based, at least in part, on said frame difference; 
calculating a weighted average current frame from the current frame and said best matching frame, 
based, at least in part, on the weighting factor. 

1 2. The method of claim 1 0 wherein said step of detecting the edge orientation of a pixel comprises the steps 
of: 

convolving said pixel with a gradient impulse response array; 

determining a maximum gradient; and comparing said maximum gradient to a threshold to deter- 
mine whether a pixel belongs in an edge region or a nonedge region. 

13. The method of claim 10 wherein said step of filtering spatial artifacts from said video image sequences 
comprises the step of: 

filtering spatial artifacts from said video image sequences using a two-dimensional low-pass filter 
for pixels determined to have an orientation in a nonedge region of said video sequences. 

14. The method of claim 10 wherein said step of filtering spatial artifacts further comprises the steps of: 

selecting a direction from a pixel at the edge region p(c+L) through a center pixel p(c) to a second 
pixel at the edge region p(c-L); 

calculating the output of a first finite impulse response filter as a function of a center pixel p(c) and 
all pixels from p(c) to p(c+L) along said direction; 

calculating the output of a second finite impulse response filter as a function of center pixel p(c); 

and 

calculating the output of a third finite impulse response filter as a function of ail pixels from p(c-1) 
to p(c-L) along said direction; 

wherein L is equal to (window length-l)/2. 
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